42 research outputs found

    POSH: Paris OpenSHMEM: A High-Performance OpenSHMEM Implementation for Shared Memory Systems

    Get PDF
    In this paper we present the design and implementation of POSH, an Open-Source implementation of the OpenSHMEM standard. We present a model for its communications, and prove some properties on the memory model defined in the OpenSHMEM specification. We present some performance measurements of the communication library featured by POSH and compare them with an existing one-sided communication library. POSH can be downloaded from \url{http://www.lipn.fr/~coti/POSH}. % 9 - 67Comment: This is an extended version (featuring the full proofs) of a paper accepted at ICCS'1

    QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

    Get PDF
    Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

    MPI Applications on Grids: A Topology-Aware Approach

    Get PDF
    Large Grids are build by aggregating smaller parallel machines through a public long-distance interconnection network (such as the Internet). Therefore, their structure is intrinsically hierarchical. Each level of the network hierarchy gives performances which differ from the other levels in terms of latency and bandwidth. MPI is the de facto standard for programming parallel machines, therefore an attractive solution for programming parallel applications on this kind of grids. However, because of the aforementioned differences of communication performances, the application continuously communicates back and forth between clusters, with a significant impact on performances. In this report, we present an extension of the information provided by the run-time environment of an MPI library, a set of efficient collective operations for grids and a methodology to organize communication patterns within applications with respect to the underlying physical topology, and implement it in a geophysics application

    A task-based approach to parallel parametric linear programming solving, and application to polyhedral computations

    Full text link
    Parametric linear programming is a central operation for polyhedral computations, as well as in certain control applications.Here we propose a task-based scheme for parallelizing it, with quasi-linear speedup over large problems.This type of parallel applications is challenging, because several tasks mightbe computing the same region. In this paper, we are presenting thealgorithm itself with a parallel redundancy elimination algorithm, andconducting a thorough performance analysis.Comment: arXiv admin note: text overlap with arXiv:1904.0607

    MPI Applications on Grids: A Topology-Aware Approach

    No full text
    Large Grids are build by aggregating smaller parallel machines through a public long-distance interconnection network (such as the Internet). Therefore, their structure is intrinsically hierarchical. Each level of the network hierarchy gives performances which differ from the other levels in terms of latency and bandwidth. MPI is the de facto standard for programming parallel machines, therefore an attractive solution for programming parallel applications on this kind of grids. However, because of the aforementioned differences of communication performances, the application continuously communicates back and forth between clusters, with a significant impact on performances. In this report, we present an extension of the information provided by the run-time environment of an MPI library, a set of efficient collective operations for grids and a methodology to organize communication patterns within applications with respect to the underlying physical topology, and implement it in a geophysics application

    Data Coherency in Distributed Shared Memory

    No full text
    International audienceWe present a new model for distributed shared memory systems, based on remote data accesses. Such features are offered by network interface cards that allow one-sided operations, remote direct memory access and OS bypass. This model leads to new interpretations of distributed algorithms allowing us to propose an innovative detection technique of race conditions only based on logical clocks. Indeed, the presence of (data) races in a parallel program makes it hard to reason about and is usually considered as a bug
    corecore